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^ ■ ABSTRACT 

We have developed a multiscale structure identification algorithm for the detection of over- 
densities in galaxy data that identifies structures having radii within a user-defined range. Our 
"multiscale probability mapping" technique combines density estimation with a shape statis- 
tic to identify local peaks in the density field. This technique takes advantage of a user-defined 
range of scale sizes, which are used in constructing a coarse-grained map of the underlying 
fine-grained galaxy distribution, from which overdense structures are then identified. In this 
study we have compiled a catalogue of groups and clusters at 0.025 < z < 0.24 based on the 
Sloan Digital Sky Survey, Data Release 7, quantifying their significance and comparing with 
other catalogues. Most measured velocity dispersions for these structures lie between 50 and 
400 km s _1 . A clear trend of increasing velocity dispersion with radius from 0.2 to 1 h^ 1 Mpc 
is detected, confirming the lack of a sharp division between groups and clusters. A method 
for quantifying elongation is also developed to measure the elongation of group and cluster 
environments. By using our group and cluster catalogue as a coarse-grained representation of 
the galaxy distribution for structure sizes of < 1 h^ 1 Mpc, we identify 53 filaments (from 
an algorithmically-derived set of 100 candidates) as elongated unions of groups and clusters 
at 0.025 < z < 0.13. These filaments have morphologies that are consistent with previous 
samples studied. 

Key words: catalogues - galaxies: clusters: general - galaxies: groups: general - large-scale 
structure of Universe - methods: statistical - surveys 



1 INTRODUCTION 

Galaxy groups and clusters are important in studies of galaxy evo- 
lution and cosmology, with large samples necessary to draw robust 
conclusions about the role played by environment. Cluster cores are 
populated by redder galaxies than elsewhere, and contain a higher 
fraction of ellipticals, with a co rrespo nding deficit in the number of 
spirals and irregulars (Dressier 



1980). This trend weakens with in- 



creasing redshift, implying evolutionary processes (Dressier et al. 



19971) that are depend ent on the environmental density of galax- 



ies (Smith et al. |2005|). There is also a star formation rate-density 
(SFD) relation: cluster cores contain redder galaxies with lower star 
formation rates. This result appears to be a continuous function of 
density, rather than a discrete st ep divi ded into "cluster" or "void" 
environments (Hashimoto et al. 1998). The SFD relati on is b oth 
redshift-dependent (Wilman et al. 120051 : Poggianti et al. 



2006; El 
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baz et al. 120070 and scale-dependent (Balogh et al. |2004| ; Kauff 
mann et al 



2004) 



The concentration of mass in a galax y's environment or host 
halo (Haas, Schaye & Jeeson-Daniel l2012h may trigger local phys- 
ical processes that influ ence ev olution. Such processes include gas 
stripping (Gu nn & Gott . 



1972 ), shocks in the intracluster medium 



(Moran et al. 120051) . harassment (Moore et al . 1 199a) and galaxy- 
galaxy interaction (Ostriker & Tremaine 1 1975ft . The extent of this 



influence, contrasted against differences in the evolution of galax- 
ies with specific masses, has the potential to discriminate between 
models of galaxy evolution. 

The internal structure of clusters presents an im portant test 
of numerical simulations (e.g. Lewis, Buote & Stocke 120030 . As 



mass tracers, clusters can also constrain cos mology through their 
count s as a function of redshift (Evrard et al. 
Mohr|2004J) and highlight features of large-scale structure. Groups 



20021 ; Majumdar & 



and clusters are not isolated, but are connected and arranged in a 
non-trivial manner by cosmic superstructures, including fil aments , 
walls and superclusters (e.g. de Lapparent, Geller & Huchra l 19860 . 



Fil ament s may be traced by groups and clusters (e.g. Connolly et 
al. 119960 . and often occupy the sp aces b etween massive clusters 



(Pimbblet , Drin kwater & Hawkrigg |2004j; Colberg, Krughoff and 
Connolly I2005T) . Similarly, superclusters hav e been identified as 
unions of smaller structures (e.g. Einasto et al. 1200 11) . demonstrat- 



ing the use of group and cluster catalogues for the identification of 
large-scale structure. 

Efficient and objective algorithms are required to find and 
quantify groups and clusters in galaxy data. Many such algorithms 
have been developed or refined in recent times to explore the wealth 
of data avail able th rough galax y surv eys (e.g. Miller et al. 
Koester et al. 



20071 ; Dong et al. 



2005 



2008; Robotham et al. 



20111) such 



as th e Sloa n Digital Sky Survey (SDSS; York et al. l200fj ; Abazajian 
et al. 1 20091) . There are some features exhibited by most clusters that 



can be exploited by such algorithms. For instance, most rich clus- 
ters contain a group of early-type galaxies found at the centre: the 
red sequence, showing up as an ov erdensity in colour-magnitude 
space (Gladders & Yee 



2000 



2005) 



An empirical relation between Brightest C luster Galaxy 
(BCG) magn itude and redshift (e.g. Brough et al. 120021 ; Loh & 
Strauss 120060 can be used to select galaxies that have expected 



BCG properties in redshift, colour and magnitude, a ccomp anied 



2003). The 



by a spatial overdensity of galaxies (Bahcall et al. 
galaxies contained in any given cluster tend to have similar star- 
formation histories and can usually be expected to group together 
when plotted on colour-magnitude diagrams. This allows a search 
for "colour-clustering" alo ng with spatial clustering on the sky and 
in red shift (e.g. Goto et al. 120021) . The C4 algorithm (Miller et al. 
2005) identifies clusters as overdensities in a seven-dimensional 



position and colour space, thus minimising projection effects. Al- 
though the size of the physical spatial aperture is fixed, C4 is mul- 
tiscale in the sense that the use of colours allows the detection of 
structures with a range of sizes. 

The search for morphological, colour-magnitude and cluster- 
ing properties can be combin ed usi ng high-level algorithms, includ- 
ingmaxBCG (Koester et al. 



al. 



1996: Kawasaki et al. 



2007) and matc hed filt er (Postman et 



19981: Kepner et al. 119991 : Gilbank et al 



2004; Dong et al. 2008.). While such algorithms efficiently detect 



structures with the properties they are trained to find, they are nec- 
essarily less sensitive to structures with different properties. 

If observed colour-magnitude and morphological properties 
are ignored or not available, galaxy positions alone must be used. 
A simple approach is to smooth th e input galaxy distr ibutio n (e.g. 



2004; Yoon et al. 



2008). This 



Gaussian smoothing: Balogh et al 
smoothing performed on the input galaxies makes the galaxy distri- 
bution easier to interpret visually, but in such single-scale smooth- 
ing, a scale must be chosen, and different overdensity catalogues 
or properties are obtained with each possible choice. If the chosen 
scale is too small, no structures are identified; but if it is too large, 
all structures are joined together and become indistinguishable. 

Multiscale algorithms are needed to interpret the information 
gathered on different scales and to make a choice about which 
scales are most important at which locations, removing the need 
for manual inspection of output for many scales. Such multiscale 
algorithms have already been implemented. For examp le, the Min- 
imal Spanning Tree (MST; Barrow, Bhavsar, & Sonoda l 19851) joins 
together input galaxies such that the total edge length is minimised. 
An MST approach can be used to recognise structures by separa- 
tion, the removal of all edges above a separation length, equ ivalent 
to the friends-of-fri ends a pproach (FoF: Huch ra & Geller 
Bhavsar & Splinter 



1982 



1996; Berlind et al. 



2006) in which the sep- 



aration length is implemented by a combination of projected and 
line-of-sight linking lengths. This is a way of identifying struc- 
tures on a range of scales, but the linking length is not directly 
tied to the scale of structure sought. Instead, the linking length ef- 
fectively sets a threshold in density, similar to structures identified 
by a density threshold in the Delaun ay Tessellation Field Estimator 
(van de Weyga ert & Schaap 120090 . Wavelet a pproac hes (Slezak, 



1995; Vikhlinin et 



Bij aoui & Mars 1 19901 ; Escalera & MacGillivray 
al. 119980 require the choice of an analysing wavelet, introducing 



shape-dependence. 

Our objective is the ability to detect structures having radii 
within a user-defined range (e.g. for finding clusters rather than fea- 
tures of large-scale structure), and to do this without over-specific 
assumptions about the properties of the target structures. We have 
developed a multiscale algorithm that may be directly tuned to be 
most sensitive to any given range of scales. By limiting the ar- 



bitrariness of our assumptions where possible, we aim for gener- 
ality similar to that of st atistic al correl ation f unction approaches 
(e.g. Balian & Schaeffer 



19891 : Infante 



19941) . While probability 



and scale values may be used to describe statistical properties of 
the galaxy distribution, we use these quantities to map the galaxy 
distribution by locating overdensities. Our multiscale probability 
mapping (MSPM) approach is demonstrated by the identification of 
groups and clusters, predominantly structures with projected radii 
less than 1 h~ x Mpc. These are subsequently used in an algorithmic 
search for filaments. 

We introduce our new approach, detailing the algorithm, and 
its suitability for producing a coarse-grained map of the galaxy dis- 
tribution, in §[2] Our implementation with SDSS data is described 
and our selection choices summarised in §[3] The results are pre- 
sented in §[4] as a large (10443) catalogue of galaxy groups and 
clusters. Measured structure properties are discussed and results 
compared with previous studies in §|5] along with an observed cor- 
relation between group radius and velocity dispersion. By using 
the group and cluster catalogue as a coarse-grained representation 
of the galaxy distribution, we present in §|6]a quantitative algorith- 
mic approach to identify filamentary structure, and an initial fila- 
ment catalogue. Our results are summarised in §|7] Where neces- 
sary, we have adopted Ho = 100ft km s _1 Mpc -1 , fijf = 0.3 and 
SI a = 0.7, though the exact choice of values does not significantly 
affect results at < z < 0.25. Except where otherwise indicated, 
all distances are comoving. 



2 MULTISCALE PROBABILITY MAPPING (MSPM) 

Our algorithm, MSPM, is able to locate overdensities in galaxy po- 
sitional data, where overdensities are defined as regions that are 
more dense than average, more dense than surrounding locations, 
or both. As a multiscale algorithm, MSPM is sensitive to both high- 
density small-scale features and extended regions of intermediate 
density. Unlike many previous multiscale approaches, this sensitiv- 
ity may be directly constrained to lie within a user-defined scale 
range. Aside from a sampled scale range and resolution, additional 
selection choices in our implementation are listed in Section [3~4l 
MSPM comprises two distinct parts. 

(i) To retain as much useful information about the galaxy distri- 
bution as possible while at the same time minimising false detec- 
tions, a threshold is set in probability rather than density, such that 
statistically-significant regions are retained, shown in FigureQJa). 

(ii) To identify structures having radii within a user-defined 
range, a basic shape statistic is used to identify local peaks in the 
density field, shown in Figure |TJb). 

Together, these two parts provide a method by which to produce a 



coarse-grained map of the galaxy distribution, that encompasses a 
range of user-defined scale lengths, shown in Figure [TJc) and dis- 
cussed in Section [231 This output is distinct from smoothing, be- 
cause the input data are divided into separate regions. This method 
of deriving coarse-grained representations can be applied to any 
distribution of data, not only galaxy positions, and is a technique 
that MSPM is well-suited to in its implementation. 

A precursor to MSPM has been defined and applied to a sam- 
ple of Extremely Re d Gala xies (ERGs) in the Phoenix Deep Sur- 
vey by Smith et al. 1 120080 . We have enhanced this algorithm by 
automating structure identification and assignment of scale sizes. 
Our complete approach is detailed here. 



2.1 Densities and probabilities 

The input to MSPM comprises the celestial positions of a set of 
input galaxies (including redshifts, if available; Section[3}, a set of 
user-defined spatial distances defining a scale range and resolution, 
and a set of sampling locations. The scale range should be chosen 
to extend beyond the largest structures sought if structures are to be 
selected on the basis of being more dense than their environments 
(Section [2.3b . The sampling l ocation s may be the galaxy positions 
themselves (e.g. Kepner et al. 1 19991) , a natural choice that guaran- 
tees sufficient and economical sampling of the survey volume. 

Our first step is to obtain counts of galaxies around each sam- 
pling location. Each of these counts (densities) a is the number 
of nearby galaxies within a radius equal to one of the user-defined 
distances r». Our use of redshift information in the context of these 
distances is described in Section[5] The densities obtained are then 
converted to probabilities because: 



(i) we want to detect structures that are statistically significant, 

(ii) density alone cannot be used to set a threshold throughout a 
survey that exhibits a varying density (such as the typical variation 
with redshift resulting from a magnitude-limited survey), and 

(iii) density contrasts (e.g. pinner/pouter) ma y be undefined in 
low-density regions (where p outa = 0). 

To compute probabilities, counts are compared with probabil- 
ity densities. Each probability density (e.g. Poisson or Gaussian) 
is the probability of obtaining each possible count of galaxies. For 
meaningful probabilities that take into account the statistics of the 
galaxy distribution, a natural choice (referred to as "empirical") is 
derived from a statistically-comparable ensemble of counts (for ex- 
ample, those obtained at a similar redshift). Then the probability of 
obtaining a count c at some individual location, within a specific 
radius n, is: 

p , . sampling locations with a count c within radius r\ ^ 
number of sampling locations in the ensemble 
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Figure 1. A demonstration of MSPM in two dimensions, on a sample of Extremely Red Galaxies (ERGs) studied by Smith et al. 120081) . The sampled 
scale range is 20 arcseconds to 2 arcminutes in steps of 10 arcseconds. Each pixel corresponds to a sampling location on a uniform grid with a spacing of 20 
arcseconds. Black dots are ERG positions. Sensitivity to isolated galaxies has been reduced, (a) A probability map showing overdensity probabilities calculated 
using a Poisson probability density. High probabilities are displayed as white. Probability maps are sensitive to overdensities throughout the sampled scale 
range, (b) A scale map with high scale values displayed as white. Scale maps are sensitive to density gradients within high-density islands, (c) A map showing 
P — S, sensitive to structures that are simultaneously more dense than average and more dense than surrounding locations, subject to the sampled scale range. 
Contours enclose structures that would be identified with a threshold P — S > 0.5. Some large structures are missed because they are beyond the sampled 
scale range. 



With a cumulative probability density P/ for each radius Vi con- 
taining a count a, we have: 

Ci—l 

(2) 



Ci-l 



This is approximately the fraction of the probability density with 
c < Ci, and quantifies the probability of obtaining a or less within 
a radius r^. 

P/ includes \Pi (cj) so that in the case where the i-th count is 
n and all the counts within its ensemble are also n, the probability 
is 0.5, since Pi(d) = Pi(n) = 1 (equation [TJ. In this way, a prob- 
ability is associated with each sampling location, creating the i-th 
single-scale probability "map". This process is repeated for each of 
the input scales r ;, resulting in a probability map for each scale. 

2.2 Probability and scale maps 

To obtain our final probability map, we assign to each sampling 
location the maximum value of probability from all of the single- 
scale measurements at that location: This gives an overdensity 
probability, P, at each sampling location: 



P = max(P-) 



(3) 



If an overdensity is present on any of the sampled scales, it will be 
evident in this probability map. 

The other half of MSPM is the shape statistic, allowing for 
scale selection. The scale, S, at each sampling location is defined 
to be the radius hosting the highest probability (P') at that location: 



S = r(P')\ P , =P . 



(4) 



On this "scale map", low scale values are usually associated with 
local density peaks. For sampling locations close to a peak in the lo- 
cal density field, higher densities and hence higher probabilities P/ 
will be obtained at small radii, corresponding to the close proxim- 
ity of the density peak. Thus, the highest probability will be found 
at small r, resulting in a low scale value 5*. 

Equivalently, if the single-scale probability values (P/) at a 
given sampling location are a probability function (of radius), that 
function's maximum is the overdensity probability. The scale value 
at that location is the radius at which the maximum occurs. To pre- 
vent probabilities rising where the count of objects does not, prob- 
ability functions are defined to be zero in the absence of additional 
counts enclosed with increasing radius. 

A probability map and a scale map are shown in Figure Q] 
demonstrating their different but complementary functions. A prob- 
ability map by itself may join all structures together since most 
locations are overdense on at least one scale within a large scale 
range, or identify structures with boundaries that do not correspond 
to actual density variations. The scale map compensates for this be- 
haviour by comparing local densities with surrounding locations, 
guaranteeing contrasts within the sampled scale range. More com- 
plex distributions than that shown in Figure Q] cause the scale map 
by itself to identify structures that may not be denser than average. 
Our work with SDSS does not readily produce images because an 
adaptive grid is used to reduce computational effort (Section [3j- 
Hence, the demonstration i n Figu re [T] uses d ata fro m the Phoenix 
Deep Survey (Hopkins et al. 
dimensional images. 



2003; Smith et al 



2008) to create two- 



2.3 Thresholding with P and S 

The next step is to interpret the probability and scale information 
assigned to each sampling location in the form of a (P, S) pair. 
Different combinations of the two parameters can be used to locate 
various features of the galaxy distribution. 

The probability map highlights regions that are overdense 
when compared to the average density, as measured within the sam- 
pled scale range. Assuming an empirical probability density (Sec- 
tion l2. lb . regions above a threshold in probability P contain a dens- 
est fraction of the galaxy distribution by number if the sampling 
locations are the input galaxy positions or by volume if the sam- 
pling locations are distributed uniformly throughout the volume. 
For example, P > 0.9 contains at least the densest 10 per cent 
in comparison with the mean density. The fraction selected by a 
threshold increases as the scale range is increased. The amount that 
such a fraction increases also rises as the correlation between large- 
and small- scale structure decreases. For instance, P > 0.9 selects 
precisely the densest 10 per cent if only one scale is sampled, and 
more than 10 per cent as more scales are included in the probability 
estimator. 

The scale map highlights regions that are more dense than sur- 
rounding locations over the sampled scale range. A threshold in 
scale S will remove sensitivity to an upper portion of the sampled 
scale range that depends on the threshold. For example, S < 0.5 
removes all sampling locations where there is a peak in the prob- 
ability function P/ in the upper half of the sampled scale range. 
This situation tends to occur when the highest densities are found 
at large radii, meaning that the sampling location in question is in a 
relatively underdense region for radii up to the peak in the probabil- 
ity function. Assuming an empirical probability density, regions be- 
low a threshold in S contain a densest given fraction of the galaxy 
distribution in a manner similar to a threshold in probability. 

P and S do not always correlate. A structure may, for exam- 
ple, be more dense than surrounding locations but underdense rel- 
ative to the mean density. To guarantee the detection of structures 
that are overdense compared with both the mean density and sur- 
rounding locations, both maps are required. 

P and S are interpreted jointly by subtracting S from P 
(where 5* is normalised to lie between zero and one). Subtraction is 
used because high densities are associated with low values on the 
scale map (S). Subtraction is preferable to division because it gives 
the two attributes of being denser than average and more dense than 
surrounding locations roughly equal weight: identical intervals in 
P and 5* usually contain equivalent fractions of the input galaxy 
distribution. A P — S map is shown in FigureQlc). 

While various thresholds in P, S or P — S may be motivated 



by reasoning based on known properties of the target subset of the 
galaxy distribution, natural choices include: 

(i) P > 0.5 - denser than average, 

(ii) S < 0.5 - denser than surrounding locations, and 

(iii) P — S > 0.5 - denser than average and denser than sur- 
rounding locations, guaranteeing P > 0.5 and S < 0.5. 

2.4 Structure identification 

Extended structures are identified as unions of sampling locations 
(galaxies) above a chosen threshold using a friends-of-friends ap- 
proach to linking, and are not allowed to contain galaxies below 
the threshold (e.g. Section [3^2t . The linking length used is the max- 
imum of the sampled scale range. If the galaxy locations are used 
as an adaptive grid, the linked locations are identified as member 
galaxies. The centre of a structure may be associated with the peak 
of the region above the threshold, and the distance from this peak 
to the furthest member galaxy is a measure of radius (Section[4}. 

2.5 Creating coarse-grained distributions 

MSPM is well-suited to constructing a coarse-grained representa- 
tion of the galaxy distribution in ways that previous algorithms are 
not. This is because S is a basic shape statistic, and P allows us 
to set a low threshold that considers the statistical significance of 
structures. 

Choosing a limit of S < 0.5 (and similarly, P — S > 0.5) 
selects regions that are local peaks in the density field, limiting the 
"grain" size of the coarse-grained distribution to the sampled scale 
range. This feature is demonstrated in Figure QJb), in which the 
grains (dark patches) are not allowed to merge on large scales. A 
threshold in density (as approximated by P > 0.5; Figure QJa)) 
has the potential to join high-density islands together such that the 
grain-size is not uniform, and not directly controlled by the user. 
Since the real galaxy distribution contains structures with a variety 
of densities, an effective coarse-grained map should be sensitive to 
shape (as realised by S < 0.5), such that densities relative to sur- 
rounding locations (as well as to the mean density) are considered. 

Such an approach should attempt to retain as much useful in- 
formation about the galaxy distribution as possible, while reducing 
the level of noise. We would therefore like to set a low threshold 
while at the same time minimising false detections. In MSPM, this 
is attempted by thresholding with P rather than density, such that 
statistically-significant regions are retained. FigureQJa) shows that 
this has an effect reminiscent of smoothing. However, the regions 
identified by P — S, shown in Figure [TJc), have a characteristic 
grain size, a result that is not guaranteed by smoothing. 



The following sections focus on the use of MSPM in identi- 
fying galaxy groups and clusters, although the coarse-grained rep- 
resentation of the galaxy distribution produced by MSPM has ad- 
ditional functionality. In Section[6]we outline previous approaches 
to, and our use of MSPM in, producing such a representation for 
the purpose of identifying filaments of galaxies. 



3 APPLICATION TO SDSS DR7 



3.1 Survey Volume and Search Apertures 

We apply our MSPM a pproac h initially to SDSS data (Data Re- 



lease 7; Abazajian et al. 120091) using only the SDSS main spectro- 
scopic sa mple, o mitting the Luminous Red Galaxy sample (Eisen- 
stein et al. 120011) . To guarantee sufficient sampling while reducing 



computational effort we use the in put galaxy locations as an adap- 
tive grid (e.g. Kepner et al. Il999f) . Galaxies less than 2hT x Mpc 
from the survey edges are excluded as potential structure centres 
to reduce edge effects and enable comparison of structure densities 
with neighbouring volumes (Section l4~5l . Because our approach re- 
lies on a comparison with counts obtained at a similar redshift, our 
comparison volumes at all redshifts must be large enough to allow 
sufficient statistical strength. We guarantee this by restricting our 
attention to z > 0.025. 

We define our galaxy search volumes as cylinders with vari- 
able radii on the sky and a fixed line-of-sight interval in redshift, 
defined as twice an empirically-obtained redshift radius. To locate 
groups and c lusters rather than features of large-scale structure (e.g. 
Einasto et al. l 19841) . our transverse radii are set at 0.2 to 2 h^ 1 Mpc 
in steps of 0.2 h~ Mpc. This defines the scale range we sample. 
Although we are primarily interested in structures with radii < 1 
h^ 1 Mpc, we sample the larger scales so we can select structures 
that are more dense than their environments. Components of larger 
structures may be detected by this approach. 

Intracluster peculiar velocities stretch structures along the line 
of sight, preventing us from interpreting redshifts as positions in 
depth on ~ 1 h^ 1 Mpc scales. Results obtained from a preliminary 
analysis have been used as a guide to how deep our cylindrical vol- 
umes should be in the line of sight. We find that a redshift radius of 
10 h~ Mpc (Az ~ 0.004) is suitable throughout our range of red- 
shift, capturing the spread of peculiar velocities present within most 
structures. Thus our cylindrical search volumes have fixed line-of- 
sight depths in redshift of 20 h^ 1 Mpc. A smaller redshift radius 
would probably recover most of the same structures, but their ve- 
locity dispersions might be underestimated as a result of the re- 
moval of galaxies with large peculiar velocities. 

Our larger redshift radius may allow some structures to be 
identified as unions of physically unassociated galaxies across large 



distances in the line of sight, and we use sigma-clipping (described 
in Section l4~7l > to mitigate this effect. 

3.2 Redshift slices and inter-galaxy distances 

Our probabilities are computed using an empirical probability den- 
sity (Section l2~TI >. The probability associated with a particular count 
is found by comparison with all other counts within a redshift slice 
of width Az = 0.005 (~ 14 ft -1 Mpc), centred on that redshift. 
This width is chosen to provide a large background comparison vol- 
ume rather than search for nearby galaxies in the line of sight, and 
is not related to our redshift radius. For the area of sky available in 
Data Release 7, Az — 0.005 allows sufficient statistical strength, 
while retaining sensitivity to decreasing mean density caused by 
incompleteness at higher redshifts. 

Since we use the galaxies as an adaptive grid, a (P, S) pair is 
associated with each galaxy, where S is normalised to lie between 
and 1. The physical lengths 0.2 ft -1 Mpc and 2 h^ 1 Mpc are trans- 
formed to and 1 respectively. To locate structures that are over- 
dense when compared with both the mean density and surrounding 
locations, we set a threshold of P — S > 0.5. This identifies 177675 
of the 619234 galaxies (29 per cent) in the original SDSS sample 
of galaxies brighter than r = 17.77 as lying in overdense environ- 
ments. 

For the purpose of structure identification throughout the sur- 
vey volume, inter-galaxy distances are defined as 



(5) 



where d t and di os are the transverse (sky) and line-of-sight comov- 
ing separations respectively, and a line-of-sight elongation factor 
eios = 10 allows a 1 h -1 Mpc cluster to contain galaxies appar- 
ently up to 10 h^ 1 Mpc away in the line of sight, consistent with 
our 10 h^ 1 Mpc redshift radius. Figure [2] shows the first structure 
identified in our catalogue, demonstrating our structure identifica- 
tion on real data. Galaxies above the threshold are linked together 
if the distance between them is less than d — 2 h^ 1 Mpc, the max- 
imum of our sampled scale range, and P — S > 0.5 for all galaxies 
(if any) between them, as shown in Figure |2jc). 

3.3 Thresholds 

We require that each structure, as defined in Section [3~2l has at 
least four member galaxies with magnitude r < 17.77, where 
each galaxy must have P — S > 0.5, so that p roperties can be 



1995) find that veloc- 



measured for each structure. Collins et al. 
ity dispersions determined with fewer than eight radial velocities 
may be inaccurate, but we have set a lower minimum membership 
threshold to retain sensitivity to poorer structures. Our minimum 
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Figure 2. The first structure in our catalogue (Table ff}, Abell 2 151 (A bell 
Il958t Corwin ll974h in the Hercules supercluster (Tarenghi et al. ll979h . (a) 
SDSS image centred on the cluster position, showing a transverse radius 
of 10.6 arcminutes (0.332 h Mpc at z = 0.036). (b) The same image 
with r < 17.77 galaxies within a line-of-sight radius Az = 0.005 of the 
cluster centre marked as triangles, (c) Objects within a much larger field of 
view, including a transverse radius of 4 h _ 1 Mpc and the same line-of-sight 
radius. Dots are r < 17.77 galaxies, plusses are galaxies at positions with 
P — S > 0.5 and large crosses are MSPM structures, some of which are 
only partially visible within this redshift slice. 



membership requirement excludes 105652 galaxies (out of 177675 
with P — S > 0.5) that, while individually falling within over- 
dense environments, are not members of such a group. The min- 
imum overdensity probability of any individual galaxy that is a 
member of such a group is 0.56. The density of our structures is 
compared with their environments in Section l4~4l Additionally, we 
reject structures with low or unmeasurable local density contrasts 
and structures with fewer than four member galaxies after line-of- 
sight sigma-clipping. These additional criteria are described in Sec- 
tion|4] and affect less than one per cent of our candidate structures. 

3.4 Summary of selection choices 

Although we have tried to minimise the assumptions made about 
groups and clusters when identifying them, such assumptions are 
impossible to eliminate entirely. We have attempted to minimise 
the arbitrariness of the choices we have made. Below we list justi- 
fications for our more significant choices, along with the selection 
effects these choices may produce in the resultant catalogue (or cite 
sections of this paper where they are given). 

(i) Sampling locations: by using the input galaxy catalogue as 
an adaptive grid, we guarantee sufficient sampling while reducing 
computational effort. Greater spatial sensitivity would be ach ieved 
by use a regular lattice of sampling locations (e.g. Kim et al 
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for the case of the matched filter algorithm). The adoption of a reg- 
ular lattice for MSPM would require changes to our calculation of 
probabilities to account for the inclusion of void regions. A suit- 
able adaptation of our approach would recover a similar catalogue 
of structures. 

(ii) Comparison volume: Local galaxy counts are compared 
with those obtained from redshift slices of width Az = 0.005 
(Section [3.2t . From inspecting counts as a function of redshift in 
the SDSS volume, a narrower slice width would begin to become 
affected by sample variance (often referred to as cosmic variance) 
due to sampling an insufficient volume to infer the true local av- 
erage density. A larger slice width would introduce biases in the 
mean density estimate, due to the Malmquist bias resulting from 
the magnitude limit of the survey. For example, at the nearer edge 
of the redshift slice, the average density would be higher than at 
the farther edge, merely due to the inclusion of intrinsically fainter 
sources only at the lower redshifts. 

(iii) Threshold: (Section l2~3l l P — S > 0.5 selects galaxies in 
regions that are overdense relative to both the mean density and 
surrounding locations. 29 per cent of our primary galaxy sample 
satisfies this criterion (12 per cent remain after the minimum count 
threshold is enforced). A higher threshold would reduce the false 
discovery rate in our catalogue but, by including lower-significance 
detections, the MSPM catalogue retains more information about the 



galaxy distribution for a study of large-scale structure (Section[6]l. 
From comparisons with other catalogues, we find that our thresh- 
old affects the measured range of velocity dispersions, and hence 
the masses of the detected structures. Catalogues constructed from 
a smaller fraction of the galaxy population contain more massive 
groups (Section r4.7t . 

(iv) Projected scale range: to locate groups and clusters rather 
than features of large-scale structure, our transverse sampling radii 
are set at 0.2 to 2 h" 1 Mpc in steps of 0.2 ft -1 Mpc. Our threshold 
includes regions with S < 0.5, corresponding to radii less than 
1 h -1 Mpc. 1 h -1 Mpc is the characteristic radius of larger clusters 
impli ed by the two-point correlation function (e.g. Einasto et al. 
1984). Sampling larger scales would potentially identify extended 
structures such as filaments. 

(v) Redshift radius r z : we find that r z — 10/i _1 Mpc (Az ~ 
0.004) is suitable throughout our range of redshift, capturing the 
spread of peculiar velocities present within most structures, as dis- 
cussed in Section [3~Tl 

(vi) Structure identification linking length: our linking length 
of 2 h~ Mpc corresponds to the maximum of our projected scale 
range. Since most of our structures have radii < l/i -1 Mpc, al- 
tering this large linking length has little effect on the structures we 
find. Similarly, our line-of-sight elongation factor ei os = 10 allows 
a 1 h" 1 Mpc cluster to contain galaxies apparently up to 10 h^ 1 
Mpc away in the line of sight, consistent with our 10/i _1 Mpc 
redshift radius. Altering ei os would thus produce effects similar to 
those of altering the redshift radius. 

(vii) Minimum count of member galaxies: We require that 
each structure has at least four member galaxies, as discussed in 
SectionEOl 



4 GROUP AND CLUSTER CATALOGUE 

The resultant catalogue in Table[T]contains 10443 structures in the 
redshift range 0.025 < z < 0.24, containing a total of 72023 
member galaxies, 12 per cent of the input galaxy data. This is lower 
tha n the 3 7 per cent identified in the Mr20 catalogue of Berlind et 
al. ( I2006h . b ut grea ter than the 8 per cent contained by C4 clusters 
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(Miller et al. 120051) . Detailed comparison with these catalogues is 
discussed in Section \5A\ Structures were sought at z > 0.24, but 
none were found because of incompleteness. 

For each structure, we measure a range of properties. Po- 
sitions, overdensity probabilities, counts and radii result directly 
from our structure identification procedure. Local density contrasts 
are a density-based quantification of the significance of our detec- 
tions and velocity dispersions measure the total mass in the systems 
we have found. 




0.05 



0.10 0.15 
Redshift 



0.20 



0.25 



Figure 3. The redshift distribution of MSPM structures in Table [T] mean: 
0.086, median: 0.082. We restrict our attention to z > 0.025 (SectionfTTV 
and do not find any structures at z > 0.24 because of incompleteness. 



4.1 Position 

Consistent with our structure selection, we quantify structures with- 
out the use of colour-magnitude information, using only our P — S 
values and identified member galaxies. An MSPM structure is de- 
fined to have the same position on the sky as its member galaxy 
with the highest P — S value. For structures that are elongated or 
asymmetric, an average position on the sky may not select the dens- 
est part of the structure. Peaks in P — S most accurately identify 
the centres of structures containing at least eight member galaxies. 

Our appr oach i s analogous to the maximum density measure 
of Yoon et al. ( 120080 . Since redshifts cannot be interpreted as pre- 
cise positions on scales of ~ 1 h^ 1 Mpc, our structures have been 
assigned the averag e redshift of the member galaxies, similar to 
Berlind et al. 1 120061) . Figure [5] shows that the redshift distribution 
of our catalogue peaks at z ~ 0.08, lower than that for the in- 
put galaxy data (z ~ 0.1). The difference is caused by the SDSS 
magnitude limit. The consequence of a magnitude limit is for the 
higher-redshift galaxies that enter the sample to be more luminous 
and massive, and typically to lie in overdense regions. However, 
the fainter members of such overdensities may not enter the sam- 
ple, and as a result this reduces the number of overdensities that can 
be recovered at z > 0.15. 



4.2 Overdensity probability 

The overdensity probability (P) reported for each structure in Table 
Q]is defined to be the value at its P — S peak. Figure [4] shows that 
7 1 per cent of our structures are detected with overdensity proba- 
bilities of 0.9 or greater, meaning that they are in the densest 10 per 
cent of the galaxy distribution, within the scale range sampled. Be- 
cause the centres of overdensities have low S values, most (76 per 
cent) of these high probability values result from our chosen prob- 



Table 1. Catalogue of MSPM groups and clusters in SDSS DR7. 



RA (J2000) 
ID (deg) 
(1) (2) 

1 241.3437 

2 167.7442 

3 223.2302 

4 240.5182 

5 169.1441 

6 240.5750 

7 234.9231 

8 351.1126 

9 247.1607 

10 247.5227 

1000 195.9923 

2000 178.4335 

3000 171.6480 

4000 59.8634 

5000 134.5177 

6000 122.9627 

7000 159.6631 

8000 221.1220 

9000 166.1805 

10000 155.2763 



Dec (J2000) 
(deg) 
(3) 



(4) 



P 
(5) 



N 
(6) 



R 

1 Mpc) 
(7) 



LDCo.4,2 
(8) 



LDCi, 2 
(9) 



<j v Galaxy 
(kms -1 ) p/p 
(10) (11) 



17.7596 
28.6773 
16.6928 
15.9474 
29.2692 
16.3662 
21.7713 
14.6395 
39.5800 
40.7662 



35.3664 
22.3690 

3.4761 
-6.5318 
30.3496 
30.2907 
23.9223 
56.1547 

4.2125 
30.4010 



0.03629 
0.03270 
0.04428 
0.03341 
0.04651 
0.03889 
0.04120 
0.04003 
0.03015 
0.03047 



0.03443 
0.06570 
0.07444 
0.06150 
0.08505 
0.07563 
0.09476 
0.11532 
0.14423 
0.15498 



1.000 
1.000 
1.000 
1.000 
1.000 
1.000 
1.000 
0.999 
0.999 
0.999 



0.865 
0.987 
0.907 
0.750 
0.972 
0.912 
0.676 
0.886 
0.993 
0.999 



157 
111 

62 
47 
66 
48 
71 
53 
94 
85 

5 
5 
5 
4 
8 
5 
4 
6 
4 
6 



1.696 
0.870 
1.555 
0.944 
0.941 
0.903 
0.782 
0.744 
1.187 
1.056 



0.720 
0.342 
1.078 
0.625 
0.691 
0.592 
1.066 
1.484 
0.608 
1.327 



10.0 
8.8 
5.0 
10.0 
4.5 
9.0 
1.8 
3.6 
1.8 

3.3 
9.6 
18.0 
12.0 
5.8 
12.0 
16.0 
14.4 
24.0 
19.2 



3.9 
1.6 
5.9 
3.6 
6.7 
3.5 
4.2 
3.0 
2.3 
2.3 

1.4 
9.6 
2.2 
3.8 
2.7 
1.5 
4.5 
1.8 
6.0 
2.4 



680 
645 
546 
408 
456 
272 
541 
715 
757 
580 

249 
185 
267 
101 
232 
190 
117 
164 
178 
340 



513.1 
385.0 
659.6 
532.4 
583.6 
379.7 
475.9 
391.8 
351.6 
353.2 



50.4 
118.2 

76.4 

52.5 
199.2 
119.9 

73.0 
171.5 
334.0 
697.1 



Locations and measured properties of MSPM structures. Entries are ordered by P — S within slices of ascending redshift, where each slice has a width 
Az = 0.025. This table shows only a portion of our catalogue as an indication of its content. The complete catalogue can be found in the online edition of 
the Journal, or at http://www.physics.usyd.edu.au/sifa/Main/MSPM/ , along with three-dimensional visualisations. 
The selection criteria are described in Section[3] Columns (2) to (4): position; (5): overdensity probability at peak P — S; (6): count of galaxies with 
r < 17.77; (7): radius enclosing region with P — S > 0.5; (8) to (9): local density contrasts; (10): velocity dispersion; (11): galaxy density within 0.4 h~ 1 

Mpc in units of the background density. Our measurements are detailed in Section|4] 



6000 




0.6 0.7 0.8 0.9 
Overdensity Probability 



1.0 



Figure 4. Histogram of overdensity probabilities for catalogued structures, 
median: 0.946. To qualify for inclusion in our catalogue, each structure must 
have an overdensity probability of at least 0.5, so that they are in the densest 
half of the galaxy distribution. 71 per cent of our structures are detected with 
probabilities greater than 0.9. 



4.3 Count 

The count of member galaxies is the number of galaxies associ- 
ated with each structure by our structure identification process. 
Each member galaxy must have P — S > 0.5. Figure [5] shows 
that most of our structures have fewer than eight members with 
r < 17.77 above this threshold. Although we find that counts of 
member galaxies and overdensity probabilities are correlated, 56 
per cent of structures with only four member galaxies still have 
probabilities greater than 0.9. 

75 per cent of structures with eight or more member galax- 
ies (23 per cent of the full sample) have probabilities greater than 
0.95. These form a high-purity subset of our catalogue, with prop- 
erties measured more accurately, as a higher detected number of 
the structure members allows a more robust estimate of their radius 
and velocity dispersion. 



ability density (Section r2.lt over a radius of 0.2 h 1 Mpc, centred 
at the P - S peak. 



4.4 Radius 

Because of the apparent line-of-sight elongation resulting from 
galaxy peculiar velocities, we cannot use redshift information to 
determine the total physical extent of structures on ~ 1 h^ 1 Mpc 
scales. We use instead the transverse (sky) distance from the P — S 
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Figure 5. Histogram of member galaxy counts for catalogued structures, 
median: 5. To qualify for our catalogue, each structure must contain at least 
four member galaxies (Section |3. 3k 37 per cent of our catalogue has this 
count. Five per cent of our structures have 16 or more member galaxies (not 
shown). 
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Figure 6. Histogram of transverse radii measured for catalogued structures 
within our sampled scale range; mean: 0.75, median: 0.65. Two per cent of 
our structures have radii greater than 2 h~ 1 Mpc (not shown). 



peak to the furthest member galaxy. This measure will be sensitive 
to random galaxy displacements, but a more significant limitation 
for most of our structures is their low count of member galaxies. 
Figure[6]shows that most of our measured radii fall within our range 
of sampled scale values, as expected. We have found that some of 
our radii are overestimated as a result of contamination by unasso- 
ciated nearby galaxies. 

To explore the physical significance of the radii we have mea- 
sured for our structures, we have determined average galaxy den- 
sity profiles as a function of radius (Figure IT]) for structures with 
radii up to 1 h^ 1 Mpc. Galaxies within 10 h^ 1 Mpc in the line 
of sight are included in the average. The density profile for each 
structure is normalised to have a minimum of one, so that all struc- 
tures have equal weight. Profiles from individual structures sharing 
similar radii are then averaged. 
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Figure 7. Profiles of the galaxy density relative to surrounding locations, 
including P — S ^ 0.5 galaxies, within annuli centred on P — S peaks 
determined for our structures, averaged over 0.025 < z < 0.2. Each panel 
shows an average of density profiles obtained for all structures with radii 
within the indicated range. Horizontal lines indicate twice the density at 
2 h^ 1 Mpc, defined as the density in the annulus formed by rings of radii 
1.95 and 2 h^ 1 Mpc. Relative densities close to group and cluster centres 
are clipped at 10. 



Figure|7]shows that our radii approximate boundaries contain- 
ing regions twice as dense as the density at 2 h^ 1 Mpc, p2, defined 
as the density in the annulus formed by rings of radii 1.95 and 2.0 
h^ 1 Mpc. Our P — S > 0.5 criterion selects regions that are unusu- 
ally overdense, and this corresponds approximately to regions that 
satisfy p > 2p2- In real space this galaxy density contrast is much 
higher (~ 130, Section Pkot . without averaging over a large redshift 
radius. This is an empirical result, and cannot be generalised to all 
input distributions. 

Our large structures tend to have radii that enclose lower den- 
sities relative to the background than those enclosed by the radii 
of small structures. Outlying regions of larger structures are above 
our P — S threshold because they are recognised as extended re- 
gions of intermediate density, and thus unusual when compared 
with most of the galaxy distribution. The opposite effect occurs 
for our structures with smaller radii, since large densities at small 
intergalaxy separations are common, because of the intrinsic clus- 
tering of galaxies. This trend results from our decision to use the 
locations of the galaxy distribution itself for our probability den- 
sity measurements. 



4.5 Local density contrast 

We define a local density contrast (LDC) as the density within an 
inner radius divided by the density within the annulus formed by an 
outer radius projected on the sky. LDCs are found for two pairs of 
inner and outer radii: (0.4, 2) h^ 1 Mpc and (1, 2) h^ 1 Mpc. Galax- 
ies further than 10 h^ 1 Mpc in the line of sight from the structure 
position are excluded from the galaxy density. We refer to the LDC 
measures with subscripts denoting the inner and outer radii in h~ 
Mpc as LDCo.4,2 and LDCi,2. LDC measurements allow compar- 
ison with the density obtained over a large neighbouring volume. 

Since small radii centred on individual galaxies are bound to 
yield high densities, if the count of objects within the inner radius 
is only one, the resulting LDC is considered unmeasurable and 
reported in our catalogue as — 1. Structures for which neither of 
our LDCs can be measured are rejected from the catalogue. Of the 
structures that remain, where there are no galaxies in the outer an- 
nulus, the LDC is undefined, resulting from division by zero. These 
structures are retained in the catalogue. A larger outer radius or the 
inclusion of more sensitive observations would resolve this issue. 
Our exclusion of all structures close to the survey edges ensures 
that the outer radius is always within the survey. 

Defining LDCmax = max(LDCo.4,2, LDCi,2), we require that 
all of our structures have LDC miDI > 2. This constraint excludes 9 
structures from our catalogue that are less than twice as dense as 
surrounding locations, and all structures that do not have at least 
two member galaxies within 1 h^ 1 Mpc of the P — S peak. On av- 
erage, LDC mil x = 16.5, excluding 563 undefined (see above) LDC 
values. For 90 per cent of our structures, LDCo.4,2 > LDCi,2- The 
jagged appearance of the LDC histograms (Figure[8} is a result of 
integer counts dictating preferred fractions. 

4.6 Global density contrast 

Densities of our structures in units of the background galaxy den- 
sity have also been estimated. The density of a virialised structure 
in units of the critical density p a is predi cted by the sph erical col- 
lapse model (e.g. Bryan & Norman 1998 ; King & Mead boill) for 
£Im = 0.3 and our median redshift z = 0.082 to be A c = 91. 
This is an overdensity A = 304 in units of the background density 
fijv/Pcr (e.g. Voit 2005). Our data allow us to estimate the galaxy 
overdensity rather than the matter overdensity. Additionally, some 
galaxies will not be included in the SDSS spectroscopic survey be- 
cause of fibre collisions, avoidance of bright stars, and other prac- 
tical survey limitations, so this is not a robust measure of structure 
densities relative to the mean. 

For each structure, we have calculated a galaxy global den- 
sity contrast p/p as the density of galaxies within 0.4 ft -1 Mpc 
divided by the density of galaxies within a redshift slice of width 



Az — 0.005 over the entire survey area. The median global den- 
sity contrast for our structures is 130.6, and its average is 185.5. 
Because of the line-of-sight elongation caused by galaxy radial mo- 
tions, in counting galaxies within a radius of 0.4 h^ 1 Mpc we have 
included galaxies within a line-of-sight radius of 10 h^ 1 Mpc. If 
this count is only one, the resulting global density contrast is con- 
sidered unmeasurable and reported in our catalogue as —1. 



4.7 Velocity dispersion 

Rather than use the radial velocities of all galaxies within an ar- 
bitrary transverse radius (such as 1 h^ 1 Mpc) to calculate veloc- 
ity dispersions (er„), we use only those galaxies identified by our 
approach as being structure members. Equation [5] allows a large 
line-of-sight linking length that may contribute member galaxies 
that are far from structure centres in the line of sight. These galax- 
ies may not be physically associated with structures, and we use 
line-of-sight sigma-clipping to remove them, for the purpose of 
calculating a v only. Under this procedure, a v is calculated using 
the radial velocities of all member galaxies. The initial er„ value is 
used to identify outlying radial velocities, which are then removed 
before a v is then recalculated. This iterative process is also used to 
remove apparent groups that may result from the chance alignment 
of unassociated galaxies in the line of sight. 



Using the biweight estimator (Beers, Flynn & Gebhardt ll990l) . 
four iterations of a-clipping at 2a are applied to the radial ve- 
locities. Our large redshift radius raises the possibility of multiple 
structures in the line of sight. To prevent an estimation of <j v for the 
wrong structure, the median and mean radial velocities are fixed to 
prevent them from shifting during iteration. A structure is excluded 
from our catalogue entirely if fewer than four member galaxies re- 
main after clipping. Less than one per cent of our candidate struc- 
tures are rejected by this criterion. 

The range of a catalogue's a v measurements is strongly de- 
pendent on the criteria used to define and select structures. Our 
a v distribution is shown in Figure [9] and has a median of 183 
kms -1 ; structures with eight or more members have a median of 
258 km s _1 . The range of our a v measurements imply masses con- 
sistent with structures ranging from poor groups to some of the 
most massive clusters, with o~ v > 1000 kms -1 . Having allowed 
relatively poor structures (together containing 12 per cent of the in- 
put galaxy data) into our catalogue, we find that our median a v is 
comparable with those of sa mples c onstructed using similarly low 
thresholds (e.g. Berlind et al 
Connachie et al 



2006 Mr20: a v = 128 kms -1 ; Mc- 



2009: <r v = 227 kms x ), but lower than those 



found in studies with higher thresholds (e.g. Miller et al. 12005 : 
a v — 576 km s _1 ). This is evidence that we are correctly identify- 
ing structure members. Although the range of our results is consis- 
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Figure 8. (a) Histogram of LDCo.4,2 for catalogued structures; mean: 17.9, median: 12. Seven per cent of our structures have LDC0.4 2 > 40, excluding 
107 undefined values not shown (Section [4.5t . (b) Histogram of LDC1.2 for catalogued structures; mean: 5.5, median: 4. Two per cent of our structures have 
LDC1.2 > 20, excluding 563 undefined values not shown (Section l4.5t . 
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Figure 9. Open histogram: Velocity dispersions for all catalogued struc- 
tures; mean: 212 km s _1 , median: 183 km s" 1 . Shaded histogram: Struc- 
tures with eight or more member galaxies; mean: 296 km s _1 , median: 258 
km s —1 . 
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Figure 10. LDC values for our catalogued structures divided by the average 
obtained over all galaxies in the input distribution, for 0.025 < z < 0.15. 
Series are offset for clarity and 1<t bootstrap uncertainties are shown. Note 
that "all galaxies" includes those found in our structures, which contain 
72023 of the total 619234 galaxies. Average LDC values for all galaxies are 
overestimated since they are galaxy-weighted rather than volume-weighted, 
meaning that dense enviroments are sampled more than poor ones. 



tent with previous measurements, most of our individual a v mea- 
surements are based on fewer than eight radial velocities, and may 
not be accurate. 



5 PURITY, RADIUS- VELOCITY DISPERSION 
RELATION AND COMPARISONS WITH OTHER 
STUDIES 

5.1 Average local density contrast 

Because the galaxy correlation function dictates that galaxies nor- 
mally see decreasing densities as a function of radius, high local 
density contrast (LDC) values are only meaningful if they are also 



above the average LDC in the general galaxy population. We com- 
pare the LDCs in our catalogue with average LDCs obtained from 
the whole galaxy population in Figure [To] At all redshifts, LDCs 
obtained for our structures are more than those found for the whole 
galaxy population, within la uncertainties. All these averages are 
based on LDCs where the inner count of galaxies is at least two. 

5.2 Four-member detections 

Over a third of our structures have exactly the minimum count of 
four member galaxies; the impact of this threshold is discussed 
in Section |431 Although this makes them marginal overdensities, 
they have an average LDC milx value of 18.1, higher than that for 
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Figure 11. LDCmax distribution for structures with only four member galax- 
ies; mean 18.1, median 12. Six per cent of these four-member detections 
have LDC raax > 40, including 341 undefined values (Section [4.5> . 



the remainder of the catalogue, 15.6. Figure[TT]shows the distribu- 
tion of LDC max values obtained for this subset. Since four-member 
structures tend to be surrounded by poorer environments, their high 
contrast values do not indicate higher density than more populous 
structures. Higher LDCs may be produced by the relative isolation 
of a system, its intrinsic richness, or both. 

5.3 Radius- velocity dispersion relation 

Characteristic properties of groups and cluste rs are related to their 
internal dynamics and stages of evolution (Voit|2005j). Older groups 
are more concentrated, having forme d when the universe was 
denser (Navarro, Frenk & White 



1997), and are more relaxed and 



spher ical than their younger counterparts (Ragone-Figueroa et al. 
201fj|) . Under simplifying assumptions, some properties can be de- 
scribed by scaling relations. We will treat our groups as isothermal 
spheres (ft(r) oc r~ 2 ). More r ealistic mass-density profiles (e.g. 
Navarro, Frenk & White 1 19970 are shallower at small radii and 
steeper at large radii. 

Assuming an isothermal distribution, velocity dispersion is an 
indicator of total enclosed virial mass, which should be propor- 
tional to o~y. This mass is also proportional to the virial radius cubed 
(e.g. Bryan & Norman 1 199a : Kitayama & Suto ll996f) . implying a 
linear relation between radius (R) and o~ v . If A c is the mean in- 
ternal group density in units of the critical density and H is the 
Hubble parameter, this relation is: 



a v = -HA 
2 



1/2 



R. 



(6) 



With the radii we have found for the structures in our sample, we 
search for a relation between a v and radius. 

Figure [T2ja) shows <j v against radius for all structures in the 
catalogue (79 per cent of our structures lie within the parameter 
ranges shown), revealing a clear trend, albeit with much scatter. 



The scatter can be partially attributed to uncertainties in the esti- 
mates of radius and a v , and to the presence of groups and clusters 
with various internal densities in our sample (see below). The ve- 
locity dispersions are especially uncertain for low counts of mem- 
ber galaxies. Radii are determined from the projected separation 
between the P — S peak and furthest member galaxy, and so are 
at least uncertain by the mean separation of member galaxies. Sys- 
tems in the lower-right corner of Figure I12f a) have overestimated 
radii as a result of contamination by nearby unassociated galaxies. 

To quantify the trend evident in Figure fT2l a). we have per- 
formed a least-squares fit to the data as follows. Velocity disper- 
sions for all groups and clusters with radii between 0.2 h^ 1 Mpc 
and 1 h^ 1 Mpc are arranged into bins of width 0. 1 h^ 1 Mpc. Mea- 
surements that are more than one standard deviation from the mean 
a v in each bin are removed, after which 75 per cent of our data 
(6537 structures) at R < Ih^ 1 Mpc remain. Our linear fit with la 
uncertainty, shown in Figure [727 b). is 



a v = (304±3)i? + (8 ±2), 



(7) 



where R is the group or cluster radius in units of h^ 1 Mpc and 
a v is in units of km s _1 . This fit is to the sigma-clipped, unbinned 
data and is not constrained to pass through the origin. There may 
also be underlying systematic uncertainties that are not reflected by 
the random uncertainties shown in equation [7] The la confidence 
interval for the distribution of data points is ±51 km s _1 . 

Alth ough an R-a v relation is not directly noted by Berlind et 
al. 12006J), we have performed an identical analysis of data from 



their Mr20 group and cluster sample. Berlind et al. suggest that 
their velocity dispersions are 20 per cent too low at all multiplic- 
ities, so we apply a 20 per cent upward correction to compensate. 
A linear fit through their data at R < 1 ft -1 Mpc has a slope of 
(214 ± 9) h Mpc -1 km s _1 . This slope is flatter than ours, but 
the fit is consistent with our data at R > 0.5 h~ x Mpc and also 
shown in Figure fT^ b). We stress that close agreement cannot be 
expected, since R and a v are calculated differently by Berlind et 
al., who also set an effectively lower group identification threshold. 
This may partially account for their lower value, since the slope 
of the R-a v relation is a function of the mean group and cluster 
density in units of the critical density A c (equation[6]l. 

To apply equation|6]to our results, a conversion of R to units of 
Mpc entails division by h, removing the need for an assumed value 
of Ho- We also convert our radii from comoving to proper distances 
assuming our median redshift z — 0.082. The slope of our R-a v 
relation and equation [6] implies A c = 43.2 ± 1.0. This value is 
significantly lower than predicted by the sph erical collapse model 
for vi rialised systems (e.g. Bryan & Norman 1 199a ; King & Mead 
201 lh . which for Qm = 0.3 and our median redshift z = 0.082 is 
A c = 91. 




Figure 12. (a) Velocity dispersion against radius (R) for all MSPM groups and clusters, with our linear fit. (b) Data points show mean group velocity dispersion 
against radius with la bootstrap uncertainties. Series are offset for clarity. Relatively few groups and clusters with at least eight member galaxies have radii 
less than 0.4 h _1 Mpc, so a reliable average cannot be obtained. Lines are fits to data with outlying velocity dispersions removed. See text for details. 



We note that equation [6] assumes that groups have recently 
virialised and that R is the virial radius, assumptions we have not 
examined in this study. Our low A c could therefore imply that 
many of our structures are not virialised or, alternatively, it could 
arise from systematic underestimation of a v or overestimation of 
R. Moreo ver, the value of A c implied by the data of Berlind et 
al. ( 120060 is lower than ours, even though Berlind et al. optimised 
their linking lengths to select group-member galaxies occupying 
the same virialised dark matter halo. Hence, the models assumed in 
determining A c may be flawed, but we do not examine this issue 
further. 

A consistent variation of a v with increasing R to 1 h^ 1 Mpc is 
evidence that there is no firm division between groups and clusters. 
An R-a v relation could be caused by linking criteria like equa- 
tion [5] but we have ensured that the galaxies included in our ve- 
locity dispersion measurements are member galaxies and occupy 
overdense regions as demonstrated in Figure [7] Moreover, sigma- 
clipping has been used to remove galaxies spuriously included by 
our large redshift radius and line-of-sight elongation factor. 



Table 2. Other catalogues recovered by MSPM. 



Other Catalogue* 


Number 


Fraction (%) 


C4 


325/466 


70 


Berlind Mr2() 


212/394 


54 


Y08 


163/208 


78 



Numbers and fractions of other catalogues that are recovered by MSPM. In 
the case of C4, we compare at 2 < 0.1. In the case of Mr20, we consider the 
subset with at least four member galaxies. For Mr20 and Y08 we consider 
data at 0.09 < z < 0.10. See text for details. 

Table 3. MSPM structures found by other catalogues. 



Other Catalogue* 


Number 


Fraction (%) 


C4 


243/602 


40 


Berlind Mr20 


253/362 


70 


Y08 


162/616 


26 



Numbers and fractions of our MSPM catalogue recovered by other tech- 
niques, with the number of MSPM structures adjusted to reflect the varying 
survey areas (according to data release) and redshift intervals. In the case of 
C4, we consider the subset of MSPM structures with at least eight member 
galaxies. For Mr20 and Y08 we restrict MSPM to 0.09 < z < 0.10. See 
text for details. 



5.4 Comparison with other catalogues 

Since we only use data for which spectroscopic information is 
available, we focus on comparisons with group and cluster cata- 
logues that are similarly derived. When comparing any two cat- 
alogues, appropriate adjustments are made for varying survey ar- 
eas and varying redshift limits at the time of the catalogue's com- 
pilation. Counterparts are identified by looking within cylinders 
(aligned with the line of sight) centred at group and cluster cen- 
tres, with transverse and line-of-sight radii of 1 and 10 ft -1 Mpc 
(Az ~ 0.004) respectively. When determining the fraction of one 



*C4: M iller et al. 12005 
120081). 



Mr20: Berlind et al. (2006); Y08: Yoon et al. 



catalogue recovered by another, the former catalogue's candidates 
centred less than 2 1 Mpc on the sky from the edges of the survey 
area available to both catalogues are removed so that mismatches 
are not caused by survey area differences. Similar adjustments are 
made to account for the varying re dshift ranges of eac h cata logue. 
Our compa risons (with Miller et al. 



2005; Berlind etal. 



2006: Yoon 



et al. 



2008) are summarised in Tables [2] and [3] 



The C4 catalogue (Miller et al. 120051) is based on DR2 and 



offers three centroids for sky positions. In our comparison we con- 
sider the peak in the C4 density field since it is the closest analogue 
to our P - S peaks. MSPM recovers 62 per cent (431) of the 694 
C4 clusters that are more than 2/i _1 Mpc from the survey edges. 
Although the C4 catalogue is confined to the sp ectroscopic data, 
it uses the LRG sample (Eisenstein et al. 2001), which we have 



omitted from our input data. The C4 catalogue is thus based on 
approximately 1.5 times more data. At z < 0.1, where the LRG 
fraction has fallen to seven per cent, MSPM recovers 70 per cent 
(325) of the 466 remaining C4 candidates. 

The C4 catalogue imposes a minimum galaxy membership 
of 8, so to find the fraction of our catalogue matched by C4, we 
consider the subset of MSPM structures with at least eight mem- 
ber galaxies. Above the minimum C4 redshift of 0.03, 40 per cent 
(243) of our structures with eight or more members (numbering 
602 in DR2) are matched by C4. We attribute this low recovery 
rate to our multis cale a pproach and to our effectively lower thresh- 
old. Miller et al. (2005) use apertures with a fixed transverse radius 
of 1 ft -1 Mpc to search for clusters, whereas we search over a range 
of scales. Our threshold selects groups and clusters that contain 12 
per cent of the input galaxy data, whereas C4 clusters contain 8 per 
cent. Our median velocity dispersion for MSPM structures with at 
least eight members (258 km s _1 ) is also far lower than in the C4 
catalogue (576 km s _1 ), indicating that we are finding more poor 
groups. 



Berlind et al. 00061) use a friends-of-friends algorithm to con- 
struct a group and cluster catalogue based on DR3 data, using av- 
erage member galaxy positions for their centroids. We compare 
with their Mr20 sample. Since Mr20 is based on volume-limited 
input, our comparison is carried out at 0.09 < z < 0.1 so that 
our flux-limited catalogue is based on roughly equivalent data. 
MSPM recovers 54 per cent (212) of the 394 Mr20 groups and 
clusters that are more than 2ft" 1 Mpc from the survey edges at 
0.09 < z < 0.10 and that have at least four member galaxies (to 
match our minimum galaxy membership). The Mr20 groups that 
MSPM fails to recover are judged by our approach to be less dense 
than surrounding locations (S > 0.5) within Ih" 1 Mpc. These 
groups are still local density peaks when compared with smaller- 
scale environments. At 0.09 < z < 0.1, 70 per cent (253) of the 
MSPM catalogu e (num bering 362 in DR3) is matched by Mr20. 

Yoon et al. 120081 : hereafter Y08) follow a Gaussian weighting 
scheme to measure densities and construct a cluster catalogue based 
on DR5 data. Like the Berlind Mr20 sample, the Y08 catalogue is 
based on volume-limited input, so our comparison is carried out at 
0.09 < z < 0.1. We compare MSPM positions with the sky posi- 
tions of their maximum-density galaxies and their Gaussian-fitted 
redshifts, recovering 78 per cent (163) of the 208 Y08 clusters at 
0.09 < z < 0.1 that are more than 2hT x Mpc from the survey 



edges. By inspecting mismatches we find that the Y08 clusters we 
fail to recover are probably real systems, but are not concentrated 
enough for the MSPM catalogue. The Y08 catalogue allows galax- 
ies in the photometric-only portion of the SDSS data to contribute 
to their detections, which may account for at least some of the mis- 
matches. At 0.09 < z < 0.10, 26 per cent (162) of the MSPM cata- 
logue (numbering 616 in DR5) is matched by Y08. A much higher 
effective threshold is enforced by the Y08 catalogue, which con- 
tains approximately three times fewer structures at 0.09 < z < 0.1 
than the MSPM catalogue. Moreover, we have sampled a range of 
scales whereas Y08 follow a Gaussian weighting scheme with a 
fixed transverse a of 0.7 h^ 1 Mpc. 

Our comparisons show that MSPM recovers most groups and 
clusters contained in catalogues based on similar data. However, 
the relatively low threshold we have set means that many candidate 
MSPM structures are not detected in other catalogues. Neverthe- 
less, comparison of our structure LDCs with averages (Figure [Tot 
and the radius-CTi, correlation (Section [5.3t are evidence for the re- 
ality of the MSPM structures. Moreover, Figure [7] shows that our 
groups and clusters are a subset of regions that have twice the den- 
sity at 2 ft" 1 Mpc (averaged over a large line-of-sight distance). 
It remains probable that false detections and overdensities that are 
not gravitationally bound have been introduced by our relatively 
low threshold but, by including lower-significance detections, the 
MSPM catalogue retains more information about the galaxy distri- 
bution for a study of large-scale structure. 



6 FILAMENTARY STRUCTURES 

We can treat the MSPM group and cluster catalogue (Table [T} as 
a coarse-grained representation of the galaxy distribution (Section 
12.51 ) with structure sizes of < lh^ 1 Mpc, made possible by our 
range of sampled scales and threshold in P — S. This demon- 
strates a use for MSPM's sensitivity to a user-defined scale range. 
Treating groups and clusters as particles improves the numeri- 
cal and computational tractability of large-scale structure studies, 
suppresses noise contributed by isolated galaxies and reduces the 
prominence of apparent structures formed by line-of-sight pecu- 
liar moti ons. S imilar approaches have p reviously been adopted by 
Colberg 020070 and Zhang et al. 820091) on simulated data. Recon- 
struction of the underlying matter density field from a sample of 
haloes can also be used to identi fy and class ify fe atures of large- 



2012) 



scale structure (Wang et al.|2009|; Wang et al. 

Our relatively low threshold for group and cluster identifica- 
tion retains enough information about the galaxy distribution to 
identify components of filamentary structure. Our elongation prob- 
abilities, introduced below, are a measurement we have devised 



based on minimal spanning trees to identify filaments as elongated 
unions of groups and clusters. 

6.1 Identifying and measuring filaments 



puting the ir min imal spanning tree (MST; e.g. Barrow, Bhavsar, 
& Sonoda ll985T) . An MST is a graph that joins together N input 



Filaments have long been noted (e.g. Kuhn & Uson ll982f) as promi- 
nent features of redshift surveys, and are apparent in deep optical 
image s of fields containing massi ve clu sters (e.g. Kodama et al. 



2001 



Ebeling, Barrett & Donovan 2004). They have a statistically 
significant presence (Bhavsar & Ling l 19881) and become prominent 
when the galaxy distribu tion is examined on scales above 2 
Mpc (Einasto et al. ll984l) . However, no entirely algorithmic process 
has yet been employed to produce a large catalogue of filaments in 
real data. A range of algorithms has been suggested for the de- 
tection o f filam entary structure, including minimal spanning trees 
(Colberg |2007lh Dela unay tessellation field estimator (van de Wey- 
gaert & Sc haap l2009l) . modeling as a marked poi nt pro cess (Stoica 



etal. 



2005), skeleton (Novikov, Colombi & Pore 



2006), multiscale 



morphology filter (Arag on-Ca lvo et al. 120071) . "DisPerSE" (Sous- 
bie, Pichon & Kawahara ; 



blet 



20111) and galaxy axis orientations (Pimb- 



20051) . These algorithms have mostly been applied to simu- 



lated data. Observationally, the difficulty lies in the limitations of 
real data (completeness, peculiar velocities and projection effects), 
the lower density of filaments when compared with clusters, and 
the possibility that simulated filaments are not accur ate analogues 
of actual filaments. Stoica, Martinez & Saar 1 1201 Of) suggest that 
model filaments are shorter than real filaments, and do not form an 
extended network. 

Investigations with both real and simulated data have helped 
define the properties and morphologies of typical filaments. In 
simulated data, filaments are typi cally 2 ft" 1 Mpc wide (Aragon- 
Calvo, van de Weygaert & Jone s 20icl) . tend to have lengths of 
~ 15 h~ l Mpc (Colberg 120071) . with a presence that is statisti- 
cally significant up to a length of ~ 110 Mp c (Pan dey et al. 
201 1 ). Aragon-Calvo, van de Weygaert & Jones 20icl) find that 
more massive clusters host more filamentary connections, and this 
is sup ported in real data by Pimbblet, Drinkwater & Hawkrigg 
1 20041) . hereafter PDH. Simulated filaments have been morpholog- 
ically classified by Colberg, Krughoff and Connolly (2005; here- 
after CKC), with results that are also supported in real data by PDH. 

We now introduce a metric called the "elongation probability", 
which we then use to identify candidate filamentary structures from 
the MSPM catalogue. 



6.2 Elongation probability 

To measure the elongation of a group or cluster's environment, we 
examine the configuration of nearby groups and clusters by com- 



particles with N — 1 edges such that the total edge length is min- 
imised, without closed circuits. Overdensities may be iden tified by 
the removal of long edges (e.g. Bhavsar & Splinter ! 1996I) . and the 



distribution of edge lengths m ay be u sed to estimate the Hausdorff 
dimension (e.g. Martinez et al. 1 19901) . Adjacent edge a ngles can be 
used to measure linearity (Rrzewina & Saslaw 1996). In our ap- 
proach, the distribution of angles made by MST edges with a pre- 
ferred direction is used to calculate an elongation probability P e . 

In two dimensions, an MST for an unelongated (isotropic) 
configuration of structures with n edges can be expected to pro- 
duce n/2 angles less than 7r/4 (for example). Using this expected 
count of angles below the threshold in angle, the actual count and a 
Gaussian probability density, we obtain P e , calculated in the same 
way as the overdensity probabilities. For each candidate direction 
sampled, elongation probabilities are calculated as the average over 
five angular thresholds linearly spaced between 7r/20 and 7r/4. 

Fifteen candidate values of P e are found under the assump- 
tion of each of fifteen directions that are the vectors between six 
locations: 

(i) the average (group and cluster) position, 

(ii) the furthest structure from the average, 

(iii) the structure separated from the average by one quarter of 
the distance between the average and furthest positions, 

(iv) the structure separated from the average by half the distance 
between the average and furthest positions, 

(v) the structure separated from the average by three quarters of 
the distance between the average and furthest positions, and 

(vi) the position of the structure ranked as having as many struc- 
tures further away as closer to the average structure position. 

This sampling of a limited number of directions reduces computa- 
tional effort and is analogous to our use of galaxy positions as an 
adaptive sampling grid when identifying groups and clusters. In our 
implementation we are generating MSTs on group and cluster po- 
sitions rather than individual galaxies, so our MSTs only have six 
nodes on average. For these sparse MSTs, the optimal direction will 
usually align with one of the edges, which will in almost all cases 
be one of the fifteen directions we sample. However, a sampling of 
all directions would allow a better estimate of P e , and should be 
considered in any future implementation of this approach. 

P e is calculated for each of these directions, and the maximum 
value is adopted. This maximum P e is always greater than 0.5, and 
a P e distribution for MSTs comprising many nodes typically peaks 
at ~ 0.75. We consider a distribution to be significantly elongated 
if Pe > 0.875. This is an arbitrary threshold and we have not inves- 
tigated alternative values. In the current study it demonstrates the 



Table 4. Size and purity of filament samples. 





-^algorithm 


-^likely 


Purity 


3 


100 


53 


53% 


4 


25 


19 


76% 



Numbers of algorithmically identified (fatgorithm) ar >d likely (^likely) fila- 
ments contained within, determined by the minimum number of scales n m j n 
with elongation probabilities greater than 0.875. 

utility of coarse-grained mapping approaches such as MSPM for 
filament finding. Any further work should explore the effect of a 
P e threshold on purity and completeness, as well as other measures 
of elo ngation such as the inertia tensor (e.g. Ragone-Figueroa et al. 
2010) . 

6.3 MSPM filaments 

Elongation probabilities are calculated around each of our groups 
and clusters, using the positions of neighbouring groups and clus- 
ters. An elongation probability is found using structures within 
each of a set of five radii on the sky: 2 to 10 h^ 1 Mpc in steps 
of 2 h -1 Mpc. Only the sky positions of structures within 10 h^ 1 
Mpc in the line of sight are included, meaning that we are less sen- 
sitive to filaments with axes aligned close to the line of sight. We are 
not sensitive to thin bridges between groups and clusters less than 
~ 2/i _1 Mpc long. An example of a filament traced by MSPM 
groups and clusters is shown in Figure [T3l 

Filaments are identified as unions of MSPM groups and clus- 
ters that are configured such that their elongation probability is 
greater than 0.875 for a minimum number n mm of the five sam- 
pled scales. The size (after removal of overlapping volumes) and 
purity of the resultant filament sample is determined by n m i„ (Ta- 
ble|4j- Numbers of likely filaments contained by these samples are 
determined by visual inspection. 

If four of the elongation probabilities are required to exceed 
0.875, an algorithmically-selected sample of filaments is created 
with a purity of 76%, but with a sample size of only 25. For our 
catalogue and filament morphology work, we have used the 53 ap- 
parent filaments identified by visual inspection from the n ra j n = 3 
sample (which includes the n mm = 4 sample). The 47 fields that 
remain were judged to be chance alignments of groups and clus- 
ters that do not appear on closer visual inspection to be subunits of 
filamentary structure. 

There are many more filaments manifest in the data that we do 
not detect, so our filament catalogue's completeness is poor. Many 
filaments may fail our P > 0.5 requirement since they are not 
as dense as clusters. In other cases, the presence of neighbouring 
structures within 10 h^ 1 Mpc may lower filament elongation prob- 
abilities below our threshold. 



Our catalogue of 53 filaments is presented in Table|5] Each fil- 
ament is identified by the ID number of the MSPM group or cluster 
that lies at the centre of the field hosting the filament. No filaments 
are detected at z > 0.13 because of incompleteness. 

6.4 Filament morphologies 

Our subjective morphological classification is based on the scheme 
introduced by PDH and CKC. For each filamentary field, projec- 
tions of galaxy positions onto two orthogonal planes are inspected. 
In our work, one of these planes is always the sky, since we have 
only used sky positions in the calculation of elongation probabili- 
ties, favouring filaments that are oriented perpendicular to the line 
of sight. The other plane is perpendicular to the sky and parallel 
with the length of the filament. Our inspections are limited to fields 
with transverse and line-of-sight radii of 10 ft -1 Mpc that may not 
always capture the endpoints of the filament. A selection of fila- 
ments with different morphologies is shown in Figure [T4l 

In either plane, the projected configuration of galaxies is clas- 
sified as straight, curved, uniform or irregular. 

(i) Straight: galaxies form either a line or lines that are not 
curved. 

(ii) Curved: a line that is continuously bent (not simply 
crooked) into either a "C"- or "S"- shape. 

(iii) Uniform: uniformly distributed galaxies that do not form a 
clear line. 

(iv) Irregular: irregular distribution of galaxies containing 
large density fluctuations that obscure the linear structure of the 
filament. 

Following PDH, each filament is assigned a morphological 
type based on its appearance in two orthogonal planes. 

(i) Type I (straight): both are straight (e.g., Figures[T4"k d, g). 

(ii) Type II (warped): at least one is curved, with neither being 
uniform or irregular (e.g., Figures [14b. e, h). 

(iii) Type III (sheet): one (and only one) is uniform, with the 
other being either straight or curved. 

(iv) Type IV (uniform): both are uniform. 

(v) Type V (irregular): both are irregular (e.g., Figures IT4b. f, 

i). 

Some fields contain multiple filaments, and in these fields we 
classify the filament containing the groups or clusters that cause a 
high elongation probability. Fields are inspected independently by 
two authors (AGS and KAP) with a 4 per cent disagreement rate. 

The division of our filament sample by morphology is shown 
in Table[6] and a selection of filaments with different morphologies 
is shown in Figure[l4] Our results regarding the relative abundance 



Table 5. Catalogue of MSPM filaments in SDSS DR7. 
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Locations, properties and morphologies of MSPM filaments. Figures showing each filament can be found in the online edition of the Journal, and 
three-dimensional visualisations can be found at http://www.physics.usyd.edu.au/sifa/Main/MSPM/ . 
Column (1): ID of central MSPM group or cluster; (2) to (4): position; (5): highest elongation probability within 10ft. -1 Mpc; (6): count of groups and 
clusters within a 10 ft. -1 Mpc radius on the sky and 10 ft. - 1 Mpc in the line of sight (cylindrical aperture); (7): morphological type (Section [6.4) . 




Figure 13. A demonstration of our filament detection method, with a filament identified in a field centred on MSPM structure 1063. Filled circles are r < 17.77 
galaxies, plusses are galaxies at positions with P — S > 0.5 and large crosses are MSPM groups and clusters, (a) Objects within a transverse radius of 10 h 
Mpc at z = 0.03470 and within a line-of-sight radius Az = 0.005. (b) The same field of view, but with plusses and large crosses omitted. 



Table 6. Filament numbers and fractions by morphology. 



Type 


This Study (number) 


This Study (per cent) 


PDH (per cent) 


I 


26 


49 ± 10 


37 ± 3 


11 


16 


30 ±8 


34 ±3 


III 








4± 1 


IV 








0.8 ± 0.5 


V 


11 


21 ±6 


26 ±3 



The abundance of each filament type in our study compared with PDH. All 
uncertainties are Poissonian. 



of filament types are consistent with PDH and CKC within uncer- 
tainties, finding that most of our filaments are Type I (straight) or II 
(curved), with the remainder classified as Type V (irregular). The 
PDH sample is derived by visual inspection, so our technique is 
sensitive to the prominent filament morphologies apparent in real 
data. 

Our Type I fraction of 49 per cent is marginally higher than 
that found by PDH (37 per cent), who assess volumes that allow 
greater filament curvature. PDH record filaments up to a length of 
ps 45 ft -1 Mpc, and find that Type I is the dominant morphology 
for short (< 10 h^ 1 Mpc) filaments. Since our search is restricted 
to fields with radii of 10 ft -1 Mpc, we do not detect filaments (or 
filament segments) longer than 20 h^ 1 Mpc. Our elongation proba- 
bility approach is also more efficient at detecting straight filaments 
than filaments with more complex morphologies. 



7 SUMMARY 

We have designed and implemented a new algorithm, multiscale 
probability mapping, for the detection of structures in the galaxy 
distribution. MSPM can be made sensitive to any chosen range of 
scales and identifies member galaxies. Our work with SDSS DR7 
data demonstrates its abilities: 

(i) by finding groups and clusters with a range of sizes we have 
quantified a radius-velocity dispersion trend not highlighted in pre- 
vious work, 

(ii) by identifying groups and clusters through their statistical 
significance we are able to set a relatively low threshold, 

(iii) MSPM's sensitivity to a user-defined scale range allows us 
to produce a coarse-grained representation of the galaxy distribu- 
tion with a user-defined grain size, and 

(iv) using our group and cluster catalogue, we have demon- 
strated a technique to identify filaments algorithmically with a false 
discovery rate of less than 50 per cent. 

Our filament catalogue omits many filaments present in the 
data, and we lack an objective way to quantify their morphology. 
Future approaches will address these challenges. 

The data products made available in this work are a catalogue 
of 10443 groups and clusters at 0.025 < z < 0.24 and a catalogue 
of 53 filaments. The morphological similarity of our filaments to 
those of PDH shows that algorithmic filament searches have the 
potential to produce results comparable to visual inspections of real 
data. 



(a): ID299, z=0.0404, Type I (b): ID84, z=0.0446, Type II (c): ID1511, z=0.0264, Type V 
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(d): ID3305, z=0.0645, Type I (e): ID3861, z=0.0547, Type II (f): ID3094, z=0.0695, Type V 
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Figure 14. Selected fields containing filaments, centred on MSPM groups and clusters. Each panel shows r < 17.77 galaxies within a 20 h^ 1 Mpc X 20 h~ 1 
Mpc square on the sky and within a line-of-sight radius Az = 0.005 (~ 14 /i — 1 Mpc). (a)-(c) Examples of types I, II and V at z < 0.05. (d)-(f) Types I, II 
and V at z > 0.05. Field 3861 shows an example of "S-shaped" curvature, (g)-(i) Types I, II and V at z > 0.1. Figures showing all 53 filaments can be found 
in the online edition of the Journal, and three-dimensional visualisations can be found at http://www.physics.usyd.edu.au/sifa/Main/MSPM/ . 
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